Be honest: how does the following code make you feel?
std::vector<std::string> get_names(); … std::vector<std::string> const names = get_names(); |
Frankly, even though I should know better, it makes me nervous. In
principle, when get_names()
returns, we have to copy a vector
of
string
s. Then, we need to copy it again when we initialize
names
, and we need to destroy the first copy. If there are N
strings in the vector, each copy could require as many as N+1 memory
allocations and a whole slew of cache-unfriendly data accesses as the
string contents are copied.
Rather than confront that sort of anxiety, I’ve often fallen back on pass-by-reference to avoid needless copies:
get_names(std::vector<std::string>& out_param ); … std::vector<std::string> names; get_names( names ); |
Unfortunately, this approach is far from ideal.
- The code grew by 150%
- We’ve had to drop
const
-ness because we’re mutatingnames
. - As functional programmers like to remind us, mutation makes code more complex to reason about by undermining referential transparency and equational reasoning.
- We no longer have strict value semantics1 for
names
.
But is it really necessary to mess up our code in this way to gain efficiency? Fortunately, the answer turns out to be no (and especially not if you are using C++0x). This article is the first in a series that explores rvalues and their impliciations for efficient value semantics in C++.
RValues
Rvalues are expressions that create anonymous temporary objects. The
name rvalue refers to the fact that an rvalue expression of builtin
type can only appear on the right-hand side of an assignment.
Unlike lvalues, which, when non-const
, can always be used on the left-hand-side
of an assignment, rvalue expressions yield objects without any
persistent identity to assign into.2
The important thing about anonymous temporaries for our purposes, though, is that they can only be used once in an expression. How could you possibly refer to such an object a second time? It doesn’t have a name (thus, “anonymous”); and after the full expression is evaluated, the object is destroyed (thus, “temporary”)!
Once you know you are copying from an rvalue, then, it should be
possible to “steal” the expensive-to-copy resources from the source
object and use them in the target object without anyone noticing. In
this case that would mean transferring ownership of the source vector’s
dynamically-allocated array of strings to the target vector. If we
could somehow get the compiler to execute that “move” operation for
us, it would be cheap–almost free–to initialize names
from a
vector returned by-value.
That would take care of the second expensive copy, but what about the
first? When get_names
returns, in principle, it has to copy the
function’s return value from the inside of the function to the
outside. Well, it turns out that return values have the same property
as anonymous temporaries: they are about to be destroyed, and won’t be
used again. So, we could eliminate the first expensive copy in the
same way, transferring the resources from the return value on the
inside of the function to the anonymous temporary seen by the caller.
Copy Elision and the RVO
The reason I kept writing above that copies were made “in principle” is that the compiler is actually allowed to perform some optimizations based on the same principles we’ve just discussed. This class of optimizations is known formally as copy elision. For example, in the Return Value Optimization (RVO), the calling function allocates space for the return value on its stack, and passes the address of that memory to the callee. The callee can then construct a return value directly into that space, which eliminates the need to copy from inside to outside. The copy is simply elided, or “edited out,” by the compiler. So in code like the following, no copies are required:
std::vector<std::string> names = get_names(); |
Also, although the compiler is normally required to make a copy when a function parameter is passed by value (so modifications to the parameter inside the function can’t affect the caller), it is allowed to elide the copy, and simply use the source object itself, when the source is an rvalue.
1 2 3 4 5 6 7 8 9 10 11 12 | std::vector<std::string> sorted(std::vector<std::string> names) { std::sort(names); return names; } // names is an lvalue; a copy is required so we don't modify names std::vector<std::string> sorted_names1 = sorted( names ); // get_names() is an rvalue expression; we can omit the copy! std::vector<std::string> sorted_names2 = sorted( get_names() ); |
This is pretty remarkable. In principle, in line 12 above, the
compiler can eliminate all the worrisome copies, making
sorted_names2
the same object as the one created in get_names()
.
In practice, though, the principle won’t take us quite that far, as
I’ll explain later.
Implications
Although copy elision is never required by the standard, recent versions of every compiler I’ve tested do perform these optimizations today. But even if you don’t feel comfortable returning heavyweight objects by value, copy elision should still change the way you write code.
Consider this cousin of our original sorted(…)
function, which takes
names
by const
reference and makes an explicit copy:
std::vector<std::string> sorted2(std::vector<std::string> const& names) // names passed by reference { std::vector<std::string> r(names); // and explicitly copied std::sort(r); return r; } |
Although sorted
and sorted2
seem at first to be identical, there
could be a huge performance difference if a compiler does copy
elision. Even if the actual argument to sorted2
is an rvalue, the
source of the copy, names
, is an lvalue,3 so the copy can’t
be optimized away. In a sense, copy elision is a victim of the
separate compilation model: inside the body of sorted2
, there’s no
information about whether the actual argument to the function is an
rvalue; outside, at the call site, there’s no indication that a copy
of the argument will eventually be made.
That realization leads us directly to this guideline:
Guideline: Don’t copy your function arguments. Instead, pass them by value and let the compiler do the copying.
At worst, if your compiler doesn’t elide copies, performance will be no worse. At best, you’ll see an enormous performance boost.
One place you can apply this guideline immediately is in assignment operators. The canonical, easy-to-write, always-correct, strong-guarantee, copy-and-swap assignment operator is often seen written this way:
T& T::operator=(T const& x) // x is a reference to the source { T tmp(x); // copy construction of tmp does the hard work swap(*this, tmp); // trade our resources for tmp's return *this; // our (old) resources get destroyed with tmp } |
but in light of copy elision, that formulation is glaringly inefficient! It’s now “obvious” that the correct way to write a copy-and-swap assignment is:
T& operator=(T x) // x is a copy of the source; hard work already done { swap(*this, x); // trade our resources for x's return *this; // our (old) resources get destroyed with x } |
Reality Bites
Of course, lunch is never really free, so I have a couple of caveats.
First, when you pass parameters by reference and copy in the function body, the copy constructor is called from one central location. However, when you pass parameters by value, the compiler generates calls to the copy constructor at the site of each call where lvalue arguments are passed. If the function will be called from many places and code size or locality are serious considerations for your application, it could have a real effect.
On the other hand, it’s easy to build a wrapper function that localizes the copy:
std::vector<std::string> sorted3(std::vector<std::string> const& names) { // copy is generated once, at the site of this call return sorted(names); } |
Since the converse doesn’t hold—you can’t get back a lost opportunity for copy elision by wrapping—I recommend you start by following the guideline, and make changes only as you find them to be necessary.
Second, I’ve yet to find a compiler that will elide the copy when a
function parameter is returned, as in our implementation of sorted
.
When you think about how these elisions are done, it makes sense:
without some form of inter-procedural optimization, the caller of
sorted
can’t know that the argument (and not some other object) will
eventually be returned, so the compiler must allocate separate space on
the stack for the argument and the return value.
If you need to return a function parameter, you can still get near-optimal performance by swapping into a default-constructed return value (provided default construction and swap are cheap, as they should be):
std::vector<std::string> sorted(std::vector<std::string> names) { std::sort(names); std::vector<std::string> ret; swap(ret, names); return ret; } |
More To Come
Hopefully you now have the ammunition you need to stave off anxiety about passing and returning nontrivial objects by value. But we’re not done yet: now that we’ve covered rvalues, copy elision, and the RVO, we have all the background we need to attack move semantics, rvalue references, perfect forwarding, and more as we continue this article series. See you soon!
Follow this link to the next installment.
Acknowledgements
Howard Hinnant is responsible for key insights that make this article series possible. Andrei Alexandrescu was posting on comp.lang.c++.moderated about how to leverage copy elision years before I took it seriously. Most of all, though, thanks in general to all readers and reviewers!
-
Googling for a good definition of value semantics turned up nothing for me. Unless someone else can point to one (and maybe even if they can), we’ll be running an article on that topic—in which I promise you a definition—soon. ↩
-
For a detailed treatment of rvalues and lvalues, please see this excellent article by Dan Saks ↩
-
Except for
enum
s and non-type template parameters, every value with a name is an lvalue. ↩
- Want Speed? Pass by Value.
- Making Your Next Move
- Your Next Assignment...
- Exceptionally Moving!
- Onward, Forward!
Great article, but my experimentation with VS2012 shows that references are faster in some circumstances.
I tested a small class that printed when objects of that class were Copy, Move or Default constructed. I then used the following two small tests, similar to your vector sorting functions.
MyClass s;
MyClass refRet = DoByRef(s); MyClass copyRet = DoByCopy(s);
With the implementations being:
MyClass DoByRef(const MyClass &s) { MyClass ret(s); ret.DoSomething(); return ret; }
MyClass DoByCopy(MyClass s) { MyClass ret(s); ret.DoSomething(); return ret; }
The results were that DoByRef does a total of one Copy construction, and DoByCopy does two. Now this could be a result of all the code being in one module, I’m not sure yet, but it does show that at times, passing references are considerably faster.
OK looking a little more into it, if I construct ret with a std::move(s) instead, and pass an rvalue to DoByCopy, then I get a total of one move. If I pass an lvalue to it, I get a copy and a move (Slightly slower than just passing by reference which is just one copy). However this only makes passing by value useful if you’re passing in an rvalue. If you pass in an lvalue for some reason, you’re actually slower than just using a reference.
So basically I’m not convinced (yet) that it’s worth the trouble to stop using pass by reference in real code.
Any chance you could fix the broken link at the bottom of the article please?
“Follow this link to the next installment.”
Does the same rule apply if your type is largish (let’s say 128 bytes)? To me, it seems like a pass-by-value would be pretty expensive since the swap or rvalue move will still effectively be a copy, thereby causing 2 copies of the data instead of 1. In the case where you are supplying an rvalue, you end up copying the data twice as well since the move into the local variable will be a copy as will the swap. Thus the pass-by-value case will always involve two copies.
Thus to me it seems like for larger types you should still use const&. If the type can be moved more efficiently than copied, than APIs using it should provide an additional && API.
I heard about this article when it was cited in Going Native 2013 where they were recommending to use value semantics by default.
I can see why this improves performance when dealing with temporaries and/or things that can be moved. There’s a case I worry about though and I wonder what you have to say on the matter.
My worry is that the assumptions are not documented by the signature. The assumption is that I’m going to gain speed because of copy ellision, but if I change the calling site such that that can no longer be done by instead passing an lvalue then I lose that performance. For example, I my find that I want to use the same object again so I store the temporary into a variable. Now I’m paying the cost of copies when a reference would have served the same purpose.
That in and of itself doesn’t bother me, what bothers me is that this will happen without any warning. The semantic use of the function completely alters the performance, or am I missing something?
What would you recommend to avoid this issue?
Absolutely Brilliant! Thanks!
i will always pass by value for most situations.
my brain hurts after reading this. Do things really need to be so complicated in C++?
Not if you don’t care about performance. The price of performance is dealing with issues closer to the machine model. You can cover those issues up with “pretty” language abstractions like garbage collection… but then you have to give the performance back
I sympathise with Paul’s lament.
If anyone was producing a new high-performance language and they considered the use case of a user procedure to initialise a read-only vector of strings would they come up with such a pig’s ear of a solution as C++ has ended up with? C++’s excuse is C.
Its got nothing to do with garbage collection.
You’re wrong. This has nothing to do with C, and it has everything to do with the fact that C++ doesn’t garbage collect. If you think otherwise, go ahead and try to design your high-performance language.
In java, the above issue doesn’t exist since you have pointers for everything, which is pretty neat in my opinion. Then of course, you have garbage collection instead.
I’ve been thinking if you couldn’t do the same with smart pointers, and still avoid having garbage collection. It’s just that smart pointers are so ugly in C++…
In a chess program, moves can be generated and stored in a std::vector. Most programs have split the move generation by piece type, and they pass a std::vector by reference (or pointer) to append the various parts together. Code would typically look like this:
Translating this to C++11 style with std::vector return-by-value runs into the small problem that the standard containers have no infix operators to append containers in the same way as one can do with std::string. Adding a template operator+ that also has a pass-by-value left argument will fix this.
Of course, the use of the operator+ can be debated here. For numeric applications one might also use operator+ to do element-by-element addition.
OK, I guess I should have asked a question to generate a reply. So here’s a question: how can I make the above code avoid all unnecessary copies? Reading from the later installments of this blog series, I figure I need 4 overloads of operator+. Here’s a first try:
It’s different from the Matrix example in this blog because operator+ is not commutative, and it’s also different to the std::string example in the Nxxx standard proposal documents because std::vector does not have a built-in operator+=. So another question is: do I also need to have 4 overloads of operator+= to let std::vector have full append functionality? What signature would they have to have?
Your solution is fine as far as move semantics goes. A reasonable rewrite is to replace all 4 of your signatures with just:
That being said, I would be tempted to just do the following:
This isn’t quite as “cute” but is perfectly efficient. And this also comes with a caveat. If in the original code the client is calling this over and over as in:
Then you might consider leaving your code as is. Count trips to the heap. Whatever minimizes that count is the best solution. Don’t throw away vector capacity to then just allocate it back. If you can reuse capacity, doing so is always a win. If
moves
is likely to hold capacity prior to the call togenerate
then attempt to take advantage of it.Hi Howard,
Thanks for your comment. I make an upfront reservation for the the move vector’s capacity, so passing the pointer around would minimize the number of heap allocations:
One more question, though: would this still apply if I would use your stack allocator? So with
wouldn’t that make the “+” notation more viable?
IMHO, as the functions are only concerned with “generating values”, instead of knowing about vectors they should just take an output_iterator as parameter and work with it.
Then you’d just make a back_inserter to your vector, pass it around and you’re clear to go.
I think this article should be updated for C++11. There are two things wrong with it:
It leaves the impression that one should always write your assignment operator like so:
But in some important cases, this is a large performance penalty. Vector-like classes where heap memory can be reused during the copy assignment is a classic example. I’ve just written a short example showing as high as a 7X performance penalty.
In C++11 the correct way to write sorted is:
Implicit return-by-move from by-value parameters is now required.
The basic point of the article is sound: Passing by value is an important tool in the tool box. But I’ve seen too many references to this article that mistakenly throw design and testing out the window on this issue, and translate this article into “always pass by value”.
1000% agreed
In the case of C++11, wouldn’t it make sense to always use the by-value version of the assignment operator if a move constructor is provided in addition?
I think you just hammered my first point above home. Trying code:
Thanks for posting the example, that your point in the previous comment clear.
But what if exception is thrown while doing *p = *q; in “optimal” operator=. We have a vector with partially copyed items. Isn’t this a case when we are trading safety for speed?
BTW: we have a bug in this vector for the case if (capacity() < N) the end_ pointer equals to begin_ and is not updated by += N below.
I replicated this test using gcc-4.7.2 -std=c++11 -O3
compiled with -DUSE_SWAP_ASSIGNMENT 47 microseconds 48 microseconds 48 microseconds 47 microseconds
without -DUSE_SWAP_ASSIGNMENT 36 microseconds 36 microseconds 36 microseconds 36 microseconds
That yields a .32X performance difference. Could not reproduce your 7x. But even .32X is very significant.
Wow! Tried to understand what’s going on here… Am I right in assuming the difference is in the ::new(end_) T(v[i]) copy construction loop in cctor (used when -DUSE_SWAP_ASSIGNMENT) vs the *p = *q assignment loop when using “normal” MyVector& operator=(const MyVector &) ??? But then why such a difference? There’s a placement new up there (i.e. not real memory allocations, only string’s cctor invocation), how can it be that worse than an assignment? I’m sure I’m missing something… Also, testing on a mac with latest clang from trunk and gcc-4.7 built from sources, I can’t see that timing difference when compiling with g++ -std=c+11. I.e., both old and swap-based assignments time almost the same as clang’s best case. Is the different size of std::string (8 bytes in gnu’s libstdc++ vs 24 bytes in clang’s libc++) the cause of gcc’s insensitivity to the kind of assignment used?
Andrea
Think of it this way: the most efficient way to recycle something is to re-use it. The copy assignment operator can sometimes re-use memory, instead of deallocating it and then allocating more. That is what is happening in this example. One way deallocates memory just to turn around and allocate it back. The other way holds on to its memory and re-uses it for the new value. The optimization is to simply avoid calling new/delete as much as you can.
I imagine the difference you’re seeing with gcc is that they are using a ref-counted string. Try the experiment again, but using MyVector<std::vector<int>> instead.
Howard, thanks for your helpful reply. I think I’m getting hold of it now. Can you just confirm my understanding is right when I say that: When using vectors of object that have “external” resources (i.e. allocate memory on the heap as it is the case for strings), going through the route of invoking their constructor (as when doing ::new(end_) T(v[i]) in the copy constructor loop used when -DUSE_SWAP_ASSIGNMENT) makes you incur in the penalty of allocations even if those are placement news. Instead, the p=q assignments in the plain old assignment operator’s loop can re-use the already allocated memory on the destination (in particular when as in this case, the destination is a longer string) and make this approach more efficient. As you suggested, using vectors of int (or I think more generally PODs/aggregate objects) levels out the difference of the two approaches because that extra price during the placement new has not to be payed.
Andrea
That sounds right.
Hi Dave, I tried to apply the idiom for copy assignment you describe, but I encountered one suspicious nuance when trying to specify exception specification for my copy assignment. I have a “gut feeling” that something is wrong, although I cannot clearly specify it. Here is my problem. Without passing by value I would specify my assigment like this:
This really says what I need to do to assign the value from one object represented by reference ‘x’. I need to make a copy first, and swap it with my value. Can this operation throw? Surely: a copy constructor is a typical place where one would expect a throw.
Now, is the answer the same for the “pass-by-value” idiom?
Technically we are doing the same thing, but copying is somehow ejected outside of the function. The function only does a no-fail (let’s assume that) swap. So in fact, I can write:
I am telling the truth: there is nothing in the copy assignment that could cause a throw. But anyone who tries to use the assignment may throw, beacuse our function even thouh it does not copy itself, forces you to copy T, even though you are not (or may be not) aware of it. That is, by declaring the function like this (with noexcept), while technically being correct, I confuse everyone by implying that using this assignment operator does not raise exceptions. I would be more honest if I wrote:
But this also looks strange: why would I base the condition on the properties of the constructor that I never call?
Hi Andrzej,
I wish this were crisper, but I would say:
operator=
is unconditionallynoexcept
noexcept
ifT
’s copy constructor isnoexcept
noexcept
if move-constructing aT
isnoexcept
“Except for enums, every value with a name is an lvalue.”. I know I’m going to annoy you by this, but I want to inform the innocent reader that integer, pointer and member pointer template parameters aren’t lvalues either.
Good point; fixed, thanks!
You forgot about floats. They aren’t lvalues either. And few more other things as well.
No, a named float most certainly is an lvalue.
So let me get this straight, named integer isn’t lvalue but named float is?
No, unless that integer is a non-type template parameter, they’re both lvalues.
Yes, he talks about template parameters.
Thanks Dave, missed the part about template params
With
sorted3
, copy elision seems to be more complicated. As far as the function is concerned, the argument is an lvalue, so unless thesorted
function knows that the argument passed to thesorted3
function is an rvalue, it can’t perform copy elision. Or have I misunderstood? If this is the case, then it must be capable of interprocedural optimization, right? Why can’t it elide when the function parameter is returned then?Note also the possible semantic difference between pass by value and pass by reference: For example:
by_ref()
takes “anything thay isaB
” ie anything derived fromB
, whereasby_value()
takes aB
, andB
only.… or so it seems. Actually, assuming a
B
copy-constructor of the formB::B(B const &)
, thenby_value(d)
still works, via implicitby_value(B::B(d))
.B
‘s copy constructor would need to be explicit to avoid this.Not sure how big of a deal that is, but passing by value (while using explicit constructors) could prevent slicing in some situations.
(In fact, in general, the “slicing” of a Derived when constructing a Base from a Derived might be surprising in some situations. ie I suspect many people don’t think of their copy-constructor being used as a slicer. Or being ‘polymorphic’ in some way. Interesting…)
Forgive me the rant. The article is very good, and I really admire people like Dave Abrahams, who keep in touch with all this, despite the complexity.
It seems to me that C++ has got itself into such a blind alley.
After reading the article, step back and have a look: such a basic thing, passing/returning values from functions. Yet it is so complex, full of traps. It takes many pages to explain, and it requires the reader to have many years of C++ experience to be really able to grasp the explanations, and apply them succesfully.
Can anyone be expected to write reliable software, solving complex problems, if you need 10 years of experience to just reliably return a value from a function, without shooting yourself in the foot? But this article shows only a tip of the iceberg, really. What a complete horror this becomes when you factor in variadic templates, template specializations, overloading rules, overriding rules, lambda expressions, SFINAE rules, type promotions. With this on, you can pretty much never be sure your code is correct, let alone optimal.
C++ has become clever, way too clever for an ordinary programmer to use effectively.
It no longer looks good even on toy problems – too many caveats.
As a result, average programmer uses C++ in a shoddy way, resulting in buggy, suboptimal code. This is what 95% C++ programmers in the wild are doing, from my experience.
The remaining 5%, who have OCD and are really determined to do all things right without cutting corners, end up agonizing for hours on every trivial function definition, all in the spirit of the above article. Of course they get nowhere for weeks.
C++ is a tool. Tools are for making people’s lives easier. C++ doesn’t anymore. It creates more problems than it solves.
Your rant is definitely justified. My viewpoint differs in some aspects – so here’s mine:
In defense of C++: Never underestimate “simple”. Once you look deeper into it, it often gets terribly complicated – or in other words, it’s amazing on what a tower of shoulders we stand. To blame the same on other languages: garbage collection is simple – unless you look into it. Who would have thought cleaning up objects would be so terribly complicated?
Picking the wrong way to pass a parameter is rarely shooting yourself in the foot. going by the “simpler rules” (pass by reference to avoid copies) will virtually never be wrong. Even indiscriminately passing by value will be good enough most of the time.
All code of given complexity is shoddy, incomplete and questionable. The quesiton is: is it good enough for its purpose? (Our biggest problem here: change of purpose.) Knowing what corners you can cut and which you cannot makes you a good programmer. The “obsessive about everything guy” you describe is not (and I recognize myself in your OCD description).
Where I agree:
You have discovered the conundrum of choice: we pick C++ because it gives us choice – in the context of the example, we can pass by value or reference or pointer, or we might sit behind a template parameter and even not know how we pass. However, that choice is also what makes C++ hard, twice the choices isn’t always twice as good.
I am unhappy with the RValue references, because they increase the tax on a typical class without default copy implementaiton. OTOH, they do solve a problem of C++ while preserving the choices enabled by other features.
Number of features in a language is a fundamental problem for languages. Adding lambdas and templates and exceptions and rvalue references to the language make my job easier, because I can use them where appropriate. It also makes my job harder, because to understand your code, I might have to learn lambdas whether I like them or not. (And whether I would have used them or not. I think this dispartiy is a great source of rewrites: Great functionality, but they use exceptions, we use error codes. Cool solution, but I really hate template meta programming. etc.)
There is no universally perfect spot. C++ will see less use in large systems and desktop development. C++ will see more use in small microprocessor and embedded systems, because hardware and compilers are catching up just now. There’s still room for C++, and due to its variety, a lot of room.
C++ is a toolbox, not a single tool.
This is a great comment. It well captures the ying/yang of C++ development.
I think the solution is building applications in layers of abstraction. This can permit all/most of the issues related to low level optimization to be addressed in lower layers while upper layers and ignore most of this. This is why I’m a fan of C++. Unfortunately, I think a lot of programing shops miss and just start coding rather than building applications layer by layer. Using C++ in this way causes lots of frustration and confusion.
C++ is a toolbox for making other toolboxes.
Robert Ramey
Yup. I’ve been doing C++ for many years but the companies I worked in used very old version of the compilers that haven’t supported C++11. We’ve only switched a short while ago. Having to read and digest articles like this on something so simple is rather nuts. I, as most of us will, work it out and start to get used to it. However such a common task becoming so complicated is just insane.
This article was copied verbatim by someone in his blog:
prasanthmadhavan dot wordpress dot com @ /2010/11/26/the-r-value/
Amazingly, without attribution. How did you happen to find it?!
I stumbled on that doing research to convince myself that sorted() is better a style than sorted2(), or rather that you’re incorrect and it should be other way around.
I couldn’t realize at first what this was about, in case anyone was wondering Google has cached it at bit.ly/fJgRBF. Pillaging at its best, unbelievable!!
I’m not sure that I agree with you that
be in general be less efficient, even with copy elision, than:
It seems to be that in either cases, at least one copy must be made. With return value elision, sorted2 requires no more than one copy either.
Did I miss anything?
Yes. I’m not sure what, but you did.
When the argument is an rvalue, the compiler is allowed to call the 2nd one with no copying. Now, the fact is that in practice, because of the way compilers implement function calls, it requires a copy today, but with move semantics no copy is needed. To avoid the copy today, you can do something like this:
I can’t see why that is possible. In your example of
The object returned by get_names() is a temp object which will be out of the scope when sorted returns. One may argue that in principle, with RVO, sorted() receives the return value object from the caller, which in turn can pass it to get_names(). But such a scheme seems to violate the standard which requires all argument be evaluated BEFORE entering the callee.
The temporary’s lifetime lasts to the end of the full expression. That includes the assignment.
But that’s not good enough. you are going to use sorted_names2 after the statement, don’t you? You need to preserve the values by copying them to sorted_names2 right? My argument is that you’ll have to make at least one copy of the values, either by using copy constructor, or by assignment. Hence, there’s no advantage of using 1.
Once you elide the copy, the source of the copy no longer gets destroyed; the lifetime becomes that of the thing it was “copied” into.
How would that be possible? The “source” of the copy is allocated in the stack and is popped out upon the return of sorted function.
As noted in the article itself, the compiler allocates space for the return value outside the stack frame of the function doing the returning. It constructs the source there.
To see if I understand your point correctly:
You say that with a good compiler,
there’s no copy for the sort, since the gestomevector temporary() is moved into a local, where the sort happens, which in turn is moved into x?
No, I’m saying a good C++03 compiler will neither incur a copy to pass an rvalue into the function, nor will it, in most cases, incur a copy to pass a return value out of a function. There are a few cases where the realities of calling conventions make it impractical to suppress the copy upon return, and this is one of them (think about how the optimization must be implemented and you’ll see why), which explains the need for
swap()
in my example on real compilers, even good ones.So now you need to do a swap, which creates more chance for the compiler to goof up.
Why not use this good old
sorted_names sorted2(std::vector& names);
signature. The only thing I need to pray for is a RVO.
Why don’t you try that with an rvalue argument and find out?
I understood the vector example where an implicit copy is better than explicit copy. But Im not sure what to do in this situation, mystring f(); const mystring s1 = f(); const mystring& s2 = f();
If I want a const mystring which one of the above to use or both equivalent? I created a custom class and observed that both the statements makes equal number of copy constructor calls. Either 0 copy or 1 copy depending upon whether the return value is known at compile time or not. Many of my friends are say catching the return value by reference is efficient. After reading this article I feel that either the first one is better or both are equivalent. Am I right?
Yes, indeed. I tried the following with Microsoft VC++10:
And here is the result:
Clearly, passing by value reduced the number of copying of the object.
I’m a little confused by your results and your comment. The output appears to show that the compiler managed to elide one of the copies only when passed by reference.
For reference, here’s what I get running the same code on gcc4.5.1:
Here passing by value results in only one copy. I’m going to look at the optimisation settings to see if it can do better for either case.
By the way, I tried adding a move constructor to Big. It is only called if I add an std::move to the return statement of Revise. It seems the compiler is failing to identify the return value of Revise as an opportunity to elide the copy (as allowed by 12.8/34 in the FCD), so neither does it implicitly treat the expression as an rvalue (as required by 12.8/35).
Presumably the problem is with trying to elide the copies at both ends (parameter and return).
FWIW, the compiler is supposed to implicitly wrap any by-value returns in
std::move(…)
. If addingstd::move(…)
around your return value makes any difference, that’s either a (knowingly) partially-implmented feature or a bug.Actually, we also have to make sure the compiler does not inline any of functions.
There’s already a fairly complete test referenced from this comment
I am curious about one thing: I tried the example you gave above, and slightly modified the nrvo and urvo tests as follows:
Now g++ 4.5 still elides the copy in the urvo case, but not anymore in the nrvo case. It is not crystal clear to me why it is so…
The caller allocates room on the stack for the return value. In
urvo_source
, it can construct a new object there, no matter which branch of theif
is taken. Now, that happens to be the case fornrvo_source
as well, but in general, if the objects are named, they may need to maintain an identity separate from whatever becomes the return value:The most likely explanation is that when the compiler sees that two different named objects can be returned, it simply gives up on elision and assumes it needs to copy. That’s a cheap way to make the optimization in many cases without performing flow analysis.
HTH,
From where I stand (i.e. far away from compiler design…), it just seems that, in the case of nrvo_source(), the same kind of analysis could be performed in each branch of the if(), as in the whole function in your original example. In other words, it should be feasible for a compiler to realise that, in my example, it just needs to construct either a or b at the address passed by the caller.
Of course, there is nothing it could do when the function is rewritten as you show, so maybe it’s not worth the trouble.
Anyways, thanks a lot for your reply, and for this site in general: lot of good reading ahead
(I have no idea why my code was not properly highlighted: I did use the tilde fences, though)
Excellent Article. Read it twice
.
Cheers!!
Hello,
I’ve been trying to use value semantics a lot recently. It is an interesting way of coding, but sadly copy elision (at least as it is currently implemented in compilers) is too restrictive in many cases. For instance if I have: struct Wrapper : Base { Wrapper(Base const& b):Base(b){} }; this constructor will always cause b to be copied (or moved, but for this post I am interested in types where move isn’t faster than copy), even when it comes from a temporary. Sometimes jumping through hoops with emplace-like constructors allows to work around this, but not always. Also, if I have a struct NonAggregate { std::array<Type,5> member; }; (it is not an aggregate because I will give it constructors), I can’t find a way to initialize member without copying all the elements (for an aggregate I could do Aggregate obj={{Type(),Type(),…}}; ).
Now all of this would change if compilers analyzed their AST and, for every temporary object that is referenced only once and for which that reference is a copy, they collapsed this branch. Obviously I am missing some rather large details, but I believe something like that is necessary if we really want to move towards more value semantics.
I just noticed that this is precisely Core issue 1049 (great! Go Jason!) whose priority sadly got lowered
((
May be this should be reworded like so. “Unlike lvalues, which can always be used on the left-hand-side of an assignment (if the lvalue is non const)”
Good point, thanks! Fixing…
I once wrote a short blog post on Value Semantics: http://blog.vmathew.in/value-semantics
Reading old articles…
Minor comment – if you define a function move() as:
Then you can write:
Sean
So what’s wrong with the following (note the &)?
I think you meant
Assuming get_names returns by value (and not a reference) and you’re dealing with a “good” compiler there should not be a difference between the two. In the first case (#1) the compiler can make the function construct its return value directly in that space that will be referred to by the name “names”. In #2 the return value that lives on the stack gets an extension of its life-time and a reference to it is created. A good compiler doesn’t even need to allocate space for the reference but I’m not sure if most compilers are that smart (even though they could be).
If get_names returns a reference it makes a big difference, of course. Is the following code safe?
Currently, it is safe. It’s not safe C++0x (according to the current draft) because operator+(string&&,char const*) returns an rvalue reference –> You get a dangling reference.
In my opinion, people should not write #2 instead of #1 just because they think they can safe a copy. Many current compilers successfully elide the copy in #1. Also, they should not declare functions that return rvalue references (I really hope operator+(string&&,char const*) and others will be fixed) for “temporary recycling” because it opens up the possibility of dangling references.
Cheers! SG
Trick question? It’s a syntax error: you can’t cv-qualify the reference itself.
Thanks for the article, Dave!
A very minor remark: your recommended copy-and-swap assignment implementation cannot do a fast assignment to itself. Fast self-assignment could be achieved by adding an extra check to the “canonical” version:
T& T::operator=(T const& x) {
}
Of course, the speed of self-assignment is rarely relevant. But I find it slightly counter-intuitive, having a self-assignment that might fail! But if a user doesn’t even have enough memory to assign something to itself, she’s probably in deep trouble anyway!
…which would penalize the usual case and complicate the code in order to optimize a rare case, which is almost always a bad idea.
These self-assignments never have the form
x = x
anyway (nobody knowingly does a self-assignment except in test suites). They’re almost alwaysx = y
cases, where x and y may or may not refer to the same object. That means we’re in code that has to cope with an exception anyhow. There’s really zero advantage in making self-assignment a no-throw operation.Penalize? You have 3 trivial ops vs. a heap allocation.
You are right that it increases complexity, and typical scenarios are ‘x=y’.
I also agree that the test doesn’t help much if your assignment is copy-and-swap. Still I’d put them in by almost habit, since many assignment operator implementations require careful analysis to proof self-assignment is correct.
Great article, but could you also include a conclusion with an example of what we should do and not do (i.e. best practices?). Your article seems more like a discussion than a “good rule to follow”, which makes it difficult to pick out the important parts. But definitely thanks for the article. Keep posting more.
Thanks for the feedback! Speaking generally, I think the popular C++ literature is long on rules and short on insight, and I’m not naturally inclined to boil things down to a prescription, so it’s great to know when the important stuff doesn’t stand out.
But it’s just a guideline; as I explain in this comment, you’ll end up generating bigger code if you often pass lvalues that way, and bigger code, in some circumstances can be slower code—or it can just be unacceptable because of its size. So maybe you can see why I don’t dispense a lot of rules.
That guideline appears a bit wrong. For example, with that kind of code
It would be silly to pass t by value.
So I believe it should be something like:
See my reply here.
Or rather, pass by value any arguments that you would otherwise copy explicitly and which you don’t want to control the storage of the copy of.
Indeed, for something like vector::push_back, passing by value is useless, even though you want to explicitly copy the argument.
It’s not entirely useless if you know something about the type being passed, like how to move-construct it or that it can be emulated cheaply with default construct and swap
It seems that your new style copy assignment operator don’t mix well with rvalue reference in MSVC10 and in gcc 4.4 :
Hold your horses, people! All (well, much) will be revealed when we cover rvalue refs in the next installment.
Greetings from Russia! It seems to me that your blog posts “Move it with rvalue references” and “Your next assignment” are somehow broken because I can see neither the article nor the comments. That’s why I can’t get whether one really needs a move constructor at all since an lvalue will get a copy to swap with due to the by-value argument passing and an rvalue will be treated as the argument itself to.. again swap with if the compiler does the copy elision! Am I missing something?
Oh, sorry, I surely meant there might be no need of a move assignment X& operator=(X&& x); , not a move constructor.
Hi, This is a couple of unconnected thoughts that hauted me after reading your article.
(1) The copy-and-swap idiom. It implements a copy in terms of swap. Then, the default swap function is implemented in terms of three copies. It is all fine if you always implement customized swap for your classes, but if you don’t you are risking infinite recursion. Wouldn’t it be safer if the idiom was always accompained by the note saying that you need to implement swap too?
(2) In C++0x we will have lval references, rval references, and values (no-references). The Cartesian product with const/non-const qualifier gives 6 possible ways in which we can define function arguments: 1. fun( YourClass v ); // by value 2. fun( const YourClass v ); // const copy. useless?? 3. fun( YourClass & v ); // output parameter 4. fun( YourClass const& v ); // sort of by value, bot no copying 5. fun( YourClass && v ); // temporary that I can change 6. fun( YourClass const&& v ); // useless – #4 would do
Numbers 2 and 4 are probably useless, but it still leaves us with four. Even if we do not want to talk about rval refs right now it leaves us with three, which is one too much. Having spent a number of years programming in C++ I do not find in strange any more, but if you look at it from a new commer perspective there should be only two: either I will change your object, or not. I think Andrei Alexandrescu pointed that out somewhere in the discussion groups. It could be only #1 (interested in value) and #3 (interested in an object – memory location). The other two (4 and 5) are just for performance tweeks, aren’t they? Well, #5 is also about unique ownership, but half of it’s job is still performance, isn’t it? It is troublesome that you have four choices, and after your article, it is clear that it is not clear which one to choose. We may think taht we optimize, but in fact we inadvertently pessimize. If we have only two options and teh support of copy elision, move semantics, and perhaps something newer and even more powerful, we could just write:
fun( set<vector> data );
and be sure that we never add any slowdown. I am not surewhat point I am trying to make here, but Im pretty sure I want to make some point. The two functions below have different argument type. Trying to pick one when matching overload candidates would be ambiguous, but they are still two types. 1. fun( YourClass v ); 2. fun( YourClass const& v ); Why do we need #2? Because it is sometimes faster. Why do we need #1? Not sure. Because it is sometimes faster? Perhaps we do not need #1 at all? If we discard it we can change the syntax of #2 to
fun( YourClass v );
This is the same as #1 used to be, but since we discarded it there is no ambiguity. If there is some programmer’s knowlege required to perform optimization, shouldn’t it rather be provided via attributes:
fun( YourClass [[copy]] v );
But do we need even that? Is the compiler simply not smarter that us?
(4) Value semantics. It has a great suppot in C++, e.g. in form of a implicitly defined copy constructor/assignment. A couple of things were suggested to the Standards Committe to make it even better. I just wanted to list them here: 1. Implicitly defined comparison operator (that is a logical conjunction of member-wise comparisons). It was mentioned in N2326, but never really proposed 2. The definition of “the same”. In N2479. It has a status “Outstanding issues” – not sure what it means. 3. Not generating copy operations implicitly for classes with non-trivial destructors. Proposed in N2904. No idea what its status is.
Regards.
Heh, just a couple?
This is a whole article unto itself! Thanks for your contribution; I may have to respond in pieces.
An assignment, I think.
A copy and two assignments, but…
…point taken.
I’ll try to get to the rest of your material soon, but let me say now that a lot of what you’ve written sounds a lot like thoughts I‘ve been having lately. I ask the question this way: what would a language that was designed to support mutable value semantics look like?
(1) That’s why it’s customary to use the member swap in the copy-and-swap idiom:
T& operator =(T t) { t.swap(*this); return *this; }
No chance of infinite recursion, unless you implemented member swap in the canonical way, and that would be stupid.
(2) Const rvalue references are useless to the programmer and should never appear written in a program. (They can appear through template deduction.) Const by-value arguments are also useless, so as you say, we’re down to 4 variants. Let’s leave rvalue references out for the moment. We deal with by-ref, by-val and by-const-ref. Combining the usual C++ wisdom with this article pretty much leads to these guidelines: - Unconditionally use by-ref is for out or in-out parameters. However, reconsider whether you need out parameters, because returning might be just as efficient. - Unconditionally use by-val for arguments that are cheap to copy (primitives) and by-const-ref for arguments that are not copyable. - This leaves arguments that are copyable, but it is expensive to do so. This article essentially says that you should pass these by-val iff you plan to modify them inside the function, but don’t want the modifications to be visible outside. The downside of this approach is that you leak an implementation detail into the interface: if you have a traditional assignment operator, you should pass the argument by-const-ref, but if you convert it to a copy-and-swap assignment operator, you should change it to by-val.
Your guidelines are clear and fine. But what I was writng about was more fantasizing how a more perfect language could look like. In fact, one aspect could be achieved only by even more advanced compiler optimization technique. As I have little (i.e. none at all) familiarity with compilers, I may still be really fantasizing, but just consider:
FatCopy fc = prepare(); read1( fc ); read2( fc );
Two read functions do not alter the parameter; we are used to declaring:
void read1( FatCopy const & fc );
This is in order to avoid copying. This, in turn, is because we are used to think that:
void read1( FatCopy fc );
means copying. The way I have been taught C++, one thinks that the above line means passing data by copying, unless copy elision is employed. How about changing our thinking to “passing data by value”: there will be no copying, unless the function really changes value fc and copying is really unavoidable. The compiler will decide to use your copy constructor only if necessary, and in situations where you would need a copy anyways. This could be achieved as follows: The compiler always compiles
void read1( FatCopy fc );
as passing by reference, and if it finds that fc is modified by read1, it marks it “somehow” this fact in that function’s meta data, and later, it adds a copy at the call site while linking, so the calling function would be compiled to:
FatCopy fc = prepare(); FatCopy __copy = fc; read1( __copy ); read2( fc );
The copy is only if read1 modifies fc. Otherwise there is no copy whatsoever.
Regards.
I would call this a form of “copy on write”, at the compiler level. If you were looking for a name/idiom.
P.S. sounds like a good idea to me.
Actually, I’ve been talking/thinking about exactly Andrzej’s idea for about a year now, and calling it “compile-time copy-on-write.” I think it’s time for the article about ideas for the “ideal language in the spirit of C++.”
Maybe, but it’s probably not as scary as you think. Outside
namespace std
, an unqualified call toswap
won’t findstd::swap
unless you’ve explicitly brought it into scope with a using-declaration.Dave, et al. Thanks much for this site and the material so far.
Nice to hear from you, Brad! And, you’re most welcome. Thanks for being a part of what has been a very gratifying response so far.
One consideration about the following:
In principle, the fact that the function makes a copy is an implementation detail, which is invisible if you use
const vector&
. When you switch to pass-by-value, you are effectively exposing your implementation to the interface of the function, so in case you later come up with another solution that doesn’t involve copying, you’ll switch to reference with the obvious signature change, that at least will require your clients to recompile (fortunately, their code won’t change unless they are using the signature explicitly, say as a function pointer). Without this, they would just replace .so/.dll with the new version.It’s not a major problem, just a point to consider. After all, almost no explicit optimization comes at zero price. (Btw, if elimination of reference-bound temporaries was allowed, it would work in this case as well!)
First, yeah this is just another manifestation of the code size issue explained in this comment.
However #1, I have a hard time seeing a switch to pass-by-value as exposing an implementation detail. I’m not sure why; it may just be a gut reaction, but my orientation is toward pass-by-value as a default, and pass-by-reference as an optimization. Nothing obliges the function to modify or steal resources from the copied value, and the copy ought to be side-effect-free. Okay, I’ll admit I’m flailing about in the dark hoping to hit the right explanation for why it’s not an implementation detail. Let me give that some thought.
However #2, I totally disagree that legalizing your optimization—which I won’t even call “elimination of reference-bound temporaries” because you surely would not want that to be allowed in all cases—would help with the recompilation problem. It’s like I said earlier: your optimization depends on being able to see inside the called function, which is in conflict with the separate compilation model.
Hi, Your “However #1″ is somehow very inspiring. When I type the function:
int double1( int const& i ) { return 2*i; }
and then change it to:
int double2( int i ) { return 2*i; }
No-one will say taht I exposed any imlementation detail. I just want a value. In fact I would probably never write double1, because double2 is so natural: “just give me this integral value”. double1, on the other hand says: “I will take it by reference”. Imagine a function call:
return double2(5);
Who is interested in knowing that you will have yet another name to refer to the 5? I just want to double it. Now, ‘const’ means “I will not try to change it, so pas me literals and temporaries as well”. This is “weird” too. I wasn’t asking whether you would be changing it or not, I just wanted to double the number, but now I have to puzzle about whether someone will mutate my value or not. And in fact “int const&” somehow isn’t as much mutation-proof as “int” alone. You can cast away const, but you cannot cast the copy back onto the original.
Also the change from double1 to double2 only exposes a copy operation (and the destructor). Not any other function. Copy operation is special to that extent that compilers implement it for you for your own classes. Function:
void fun( YourClass cc );
Doesn’t expose any function of YourClass. It simply requires a value. Well, I know it is just some loose thoughts.
Regards.
Here is why: If you header file looks like:
class B; class A {public: void Foo(B& b);};
The user does not need to know the definition of B. On the other hand, if your header file looks like:
include “B.h”
class A {public void Bar(B b);};
Now the user of your class is forced to know B.
It seems you’re going to “blogify” the “RValue Reference 101″ article which is nice because it is currently inaccessible to users who don’t have a trac account at boostpro.
Cheers! Sebastian
It started with that article, but it’s being extensively revised and expanded.
Hi Dave,
Could you please comment on this note in the Standard (12.8/15) “when a temporary class object that has not been bound to a reference (12.2) …” Why there is this “has not been bound to a reference” constraint? Consider this (no idea how to get syntax-highlighted code here, a “how to” link would be great):
Due to the constraint,
holder_1::s
can’t be initialized directly from “some string”, and a temporary string will always be created.And this constraint appears to be in the latest draft as well.
Thanks
Please see the new “posting” tab.
I don’t think I agree with your analysis of this example. The type of
"some string"
ischar[12]
, and it must be converted to astring
in order to match the signature ofholder_1
‘s ctor in line 7 long beforeholder_1::s
is initialized. There’s no opportunity to use the ctor in line 4 as far as I can tell. Am I missing something?Well, to me, copyctor elision is (conceptually) something like “get rid of creating a temporary if it’s being used only to initialize an object of the same type” (please correct me right here if I’m wrong)? and the wording in the standard is, well, just wording, which is subject to change (e.g. there are two more allowed cases in the latest C++0x draft comparing to 2003). To do the elision, compiler should look ahead anyway? to check the usage of the temporary. E.g. here:
string(const char*)
->string(const string&)
, but the compiler “looks a bit further and sees” that the copyctor initializes from a temporary, and gets rid of it (of course, making all necessary checks about copyctor availability).I don’t see why it can’t be done here as well, as the compiler has all code, everything is inline etc. It’s an optimization, and the compiler should be clever enough to look ahead (and we know they are in many cases, like link-time optimization). But here it’s simply forbidden by the standard because the temporary has been bound to a reference. That’s why I’m asking why do we have this constraint.
OK, I understand what you’re asking, though I’m not at all sure that eliminating the text in question would be enough to allow that particular optimization. For what it’s worth, your idea is in a completely different class from today’s copy elision, because the current optimizations only “look a bit further” at the call site, and don’t require looking inside the callee as yours does, which in principle is possible though it may be too late by link time, practically speaking (too much high-level information missing by then).
I don’t really know why we have that constraint (“see D&E” is my stock answer), but if I had to guess I’d say it was there to prevent lvalues from being mutated in scenarios like
If line 5 mutated
a
that would be pretty surprising.I read somewhere that it is because the committee agreed (for C++98) that some use-cases exist where monitoring the number of copies is useful. I’m completely opposed to this; the programmer that needs to monitor the number of copies created also needs to know the strange cases where the copy is allowed to be avoided.
I would be very surprised if the particular optimization Maxim was asking for was ever considered, and even more surprised if it were ruled out on those grounds. That seems completely inconsistent with the intent of copy elision.
Yes sorry Dave, I misread it.
However, Alex Stepanov in his notes and his latest book smartly points that if we could force construction, copying and equality to retain their expected semantics, the compiler could apply that optimization.
The committee, prefering freedom, didn’t enforce any semantics. It means that
struct s { s(const char*); s(const s&); };
bool operator==(const s&, const s&);
s a1 = “123″; s a2 = a1;
(a1 == a2); // could be false, the programmer could rely on it being false!
then, we simply aren’t allowed to rewrite s a2 = a1; to s a2 = “123″;
I think you mean “the programmer couldn’t rely on it being true!” In any case, yes, Elements of Programming and regular types will definitely be topics in upcoming articles.
a1 is not a temporary here, so it doesn’t apply. Here is my conceptual understanding for the copy elision:
In your example a1 is also used later in comparison (and it’s not a temporary object at all), so it can’t be eliminated.
Rodrigo, Dave, do you agree with this conceptual definition for the copy elision?
In any case yes.
One thing I hate from C++ is the copy-ellision rules. It would be easier if the problem you point were surely caused by a weak compiler instead of a C++ rule.
While I’m not completely sure if your optimization is currently allowed in C++, for me it makes sense.
Btw my point was that a copy from an object could be different from an object created using the same initializer from the source.
Well said, Dave! What they probably mean is that they accidentally do URVO in debug mode. I wonder what happens if someone reported THAT as a bug
Well, I mentioned link time just to emphasize the power of present day optimizers. For the code in question, all analysis can be done in compile time. It’s optimization, which is never mandatory, so we should be ready for the cases when it doesn’t work (e.g. it falls to link time).
It looks like you have meant
const X& a = produce();
, right? Something strange with the markup.Well (all the latter is not strictly according to the standard definitions),
a
is not a “real temporary” here: the object returned byproduce
is, but when you bind it to a reference that extends its lifetime, it’s not a temporary anymore in terms of elision, as you explicitly say “I want this object to live beyond the full expression it was created in”. OTOH, it applies to all other references as well (e.g. references in parameters), so probably they should be also subject to elision.As long as
a
is used (and only used) to initialize a temporary (I believe this is the necessary precondition for the elision) in theconsume
‘s parameter, I see no problem with thea
optimized out.But this should probably also mean that we don’t rely on the destructor of
X
(obvious usage is the Guard idiom).I need to think more about it, I still don’t have clear picture.
I think there is another version of that. Consider:
Since exception objects are temporaries too, the restriction about reference binding makes the irritating case of above not possible.
Hi there. A similar example where we would mutate a temporary that has got a name:
We would accidentally modify the exception object. I think the restriction of the reference binding will generally make it so we can’t refer to the temporary object a second time.
Hello Dave, Sweet arcticle, brought me in understanding stratums about so many things, thanks a lot.
A question now, given this piece of code:
Output:
Meaning no copy due to elision, and even no move. This code compiles (GCC 4.7) event if copy constructor is explicitly deleted, but isn't required in theory ? Is it a standard behaviour ? On the other it doesn't compile with deleted move constructor, even if not used just like copy.
Thanks
Note that you can disable elision with -fno-elide-constructors to see the difference. There is no copying in your example, only moves. Where do you expect you might need a copy construction?
Huh that’s right. So the compiler seems to request a copy constructor even with the elision. Is it right ?
Well I was trying to have a function like the sort in the article and see what happens when I call it. The memory allocation is here so i have costly things going on when copy ellision is not done.
Basically, is there a 2-3 classes examples that when compile tells : look here ellision, there no ellision.
Try this example, whipped up at home with loving hands.
Congrats to the cook then. This is utterly perfect. Tested with gcc 4.3 I only miss 1 ellision in the “Return rvalue passed by value” case. On 4.1, all ellisions fail. Gotta text MSVC.
Thansk again
Ok. So I wanted to see if I can “checj” this things work with my current g++. So, I wrote that : http://codepad.org/craBkxXL
The output is, for gcc 4.3 : non-const call A new : 4 A delete : 6 A copy : 3
non-const call B new : 4 B delete : 6 B copy : 3
The non-const call is there to “emulate” the sort function from the article.
So, does this means the copy-stuff is done or do I don’t call the proper thing and, hence, not trigger the mecanism or is gcc 4.3 not copy-ellision aware (which I doubt) ?
I think I’m just not doing the correct thing to check this. So how should I butcher this so I can validate the use of copy ellision and show this to unbeliever co-worker ?
Joel, it’s a little hard to tell what you’re trying to demonstrate with this example. It looks vaguely as though you’re trying to show something about const vs. non-const member functions, but that distinction doesn’t make any difference to copy elision, so maybe I’m misunderstanding. Try cutting it down to the absolute minimum (e.g. remove all that memory allocation stuff) and if you’re testing several different things, separate those tests as well.
The behavior depends on compiler options and on way how temporary is objained http://ideone.com/QYYZKl
gcc 4.3 non-const call A new : 5 A delete : 7 A copy : 3
non-const call B new : 3 B delete : 5 B copy : 1
Very interesting article !
Sure, copy elision really mess with our C++ programmer’s “common sense”. It’s quite surprising to find a better way of writing assignment operator after all these years. There is certainly a lot of textbook to correct.
However, exploiting copy elision further than that, especially for return value, seems a bit fragile to me : 1) It’s hard to check if RVOs really take place, unless adding I/O in constructors. 2) Turn on debug mode and all these nice copy elisions disappear. (at least with MSVC)
“we have all the background we need to attack move semantics, rvalue references, perfect forwarding, and more as we continue this article series. See you soon!”
Great ! I’m really looking forward to seeing that. There is a lot of resource on move semantic out there, but it’s still very confusing to me. For example, I’ve yet to see a straightforward explanation of what is the correct and efficient way of writing functions like “sorted” in presence of move semantic.
Any side-effect will do; you can increment a counter.
I just did a quick test with the MSVC 2010 beta, and for that version you’re half right (elisions of arguments passed by value still happen in debug mode) update: see below . That’s a bit surprising, actually: a copied return value doesn’t help make debugging much easier (especially when argument copies are still getting elided), is likely to be misleading for those people looking for release-mode performance, could actually make debug-mode performance unacceptable even for testing, and it doesn’t take significantly more compile-time resources to do the elision. Once you implement RVO it seems like more work to leave a branch in the compiler where it’s disabled.
It is true that copy elision isn’t guaranteed by the standard, so vendors are free not to implement it, or to turn it off depending on compiler options. But is “fragile” really the right word? It’s not as though any vendor who implements copy elision can afford to break it in their next release. Also, given the lack of standard guarantees, I don’t see why you’d be more worried about exploiting return value elisions than argument copy elisions.
Actually, I take it back; you’re a quarter right (sorry)! MSVC implements the URVO in debug mode (it will return an unnamed temporary without copying) but not the NRVO. Silly compiler; I’ve submitted a bug report, FWIW.
And… it turns out not to be worth much at all: http://tinyurl.com/silly-compiler
Hi Dave,
Thanks for the great article. I however have a slightly deeper question regarding rvalues, move semantics, and copy elision:
How do you ensure that the object passed by value is definitely copied and the copy is not elided by the compiler? Also, how then do you make sure that you really do have a copied object instead of a move-constructed object as a function parameter?
void foo(T t); // how to make sure t is copy constructed, instead of move-constructed?
If you want to be sure to avoid copy elision (why would you want to do that?) then you need to pass an lvalue. Copies of lvalue function arguments are never elided.
Well, we haven’t even touched move construction yet, but as long as you’re bringing it up here, again I wonder why you’d want to do that? The answer is the same: pass an lvalue.
Very interesting article. I tend to “const &” by default and this article reminds me that this is not as optimal as I thought.
I’m intrigued on how we can check, for a given compiler at had, if it’s actually doing copy ellision and generate the proper code when we’re using such idioms. I guess adding I/O in the constructor don’t help as it’ll force the ctor to be called. Do you just check the assmebly output or what ?
Moreover, does this means that we can completely and forever dump the old pass-by-const-reference & copy operator= ?
Actually, no! Copy elision is explicitly allowed even if the copy constructor and destructor have side-effects. I/O is a great way to see these effects in action.
Actually, yes! Dump it yesterday.
Wow. I’ll give it a shot.Be sure i’ll update my C++ lessons to incorporate this
Whoa, seriously?! Huge regression, if so (4.0 elides as expected). Are you sure you don’t have -fno-elide-constructors in the spec file or something?
well i just did g++-4.1 ellide.cpp -o ellide I’ll check with 4.2 and 4.4
So did you check the spec file? BTW, elision works with 4.3
Note: you’ll have to bring the pass-by-reference signature back when you start writing move constructors. The pass-by-value idiom creates an ambiguity otherwise
Could you elaborate further on this paragraph?
“First, when you pass parameters by reference and copy in the function body, the copy constructor is called from one central location. However, when you pass parameters by value, the compiler generates calls to the copy constructor at the site of each call where lvalue arguments are passed. If the function will be called from many places and code size or locality are serious considerations for your application, it could have a real effect.”
I don’t quite follow it. Good article though, thanks!
@dvi: first, you’re most welcome, and thanks for asking for clarification; it helps to know when I fail to connect.
So here,
f
takes its argument by value, and does whatever it does… but it doesn’t copya
because (copy elision aside) it already has a copy of whatever was actually passed.In this case, the compiler has to generate calls to
X
‘s copy constructor in the body ofg
andh
at lines 5 and 10. That’s a total of two calls. Probably not a big deal, but in some embedded applications, for example, there’s limited space available for code.Now compare with what happens when
f
takes its argument by reference and copies it:Now there’s just one call to
X
‘s copy constructor, in the body off
. The exact same definitions ofg
andh
still work, but callingf
no longer involves copying at the call site.Does that help?
Ah yes that makes it clear. Thanks again